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Foreword 

This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). 

The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or 
GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables. 

The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under 
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Foreword 



This Technical Specification has been produced by the 3GPP. 

The present document defines an error concealment procedure, also termed frame substitution and muting procedure, of 
the wideband telephony speech service employing the Adaptive Multi-Rate - Wideband (AMR-WB) speech coder 
within the 3GPP system. 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifying 
change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

x the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 Indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the specification; 
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Scope 



This specification defines an error concealment procedure, also termed frame substitution and muting procedure, which 
shall be used by the AMR-WB speech codec receiving end when one or more erroneous/lost speech or lost Silence 
Descriptor (SID) frames are received. 

The requirements of this document are mandatory for implementation in all networks and User Equipment (UE)s 
capable of supporting the AMR-WB speech codec. It is not mandatory to follow the bit exact implementation outlined 
in this document and the corresponding C source code. 



Normative references 



The following documents contain provisions which, through reference in this text, constitute provisions of the present 
document. 

• References are either specific (identified by date of publication, edition number, version number, etc.) or 
non-specific. 

• For a specific reference, subsequent revisions do not apply. 

• For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including 
a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same 
Release as the present document. 

[1] 3GPP TS 26.202" AMR Wideband Speech Codec; Interface to RAN". 

[2] 3GPP TS 26.190"AMR Wideband Speech Codec; Transcoding functions". 

[3] 3GPP TS 26.193"AMR Wideband Speech Codec; Source Controlled Rate operation". 

[4] 3GPP TS 26.201"AMR Wideband Speech Codec; Frame structure". 



3 Definitions and abbreviations 

3.1 Definitions 

For the purposes of this document, the following definition applies: 

N-point median operation: Consists of sorting the N elements belonging to the set for which the median operation is 
to be performed in an ascending order according to their values, and selecting the (int (N/2) + 1) -th largest value of the 
sorted set as the median value. 

Further definitions of terms used in this document can be found in the references. 
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3.2 Abbreviations 

For the purposes of this document, the following abbreviations apply: 

AMR-WB Adaptive Multi Rate - WideBand 

AN Access Network 

BFI Bad Frame Indication from AN 

BSI_netw Bad Sub-block Indication obtained from AN interface CRC checks 

prevBFI Bad Frame Indication of previous frame 

RX Receive 

SCR Source Controlled Rate (operation) 

SID Silence Descriptor frame (Background noise) 

CRC Cyclic Redundancy Check 

ECU Error Concealment Unit 

BFH Bad Frame Handling 

medianN N-point median operation 



General 



The purpose of the error concealment procedure is to conceal the effect of erroneous/lost AMR-WB speech frames. The 
purpose of muting the output in the case of several erroneous/lost frames is to indicate the breakdown of the channel to 
the user and to avoid generating possible annoying sounds as a result from the error concealment procedure. 

The network shall indicate erroneous/lost speech or lost SID frames by setting the RX_TYPE values [3] to 
SPEECH_BAD, SID_BAD or SPEECH_LOST. If these flags are set, the speech decoder shall perform parameter 
substitution to conceal errors. 

The example solution provided in paragraph 6 apply only to bad frame handling on a complete speech frame basis. Sub- 
frame based error concealment may be derived using similar methods. 



5 Requirements 

5.1 Error detection 

If the most sensitive bits of the AMR-WB speech data (class A in [4]) are received in error, the network shall indicate 
RX_TYPE = SPEECH_BAD in which case the BFI flag is set. When the frame is not received, the network shall 
indicate RX_TYPE = RX_SPEECH_LOST in which case the BFI flag is set as well. If a SID frame is received in error, 
the network shall indicate RX_TYPE = SID_BAD.. 

5.2 Erroneous or lost speech frames 

Normal decoding of erroneous/lost speech frames would result in very unpleasant noise effects. In order to improve the 
subjective quality, erroneous/lost speech frames shall be substituted with either a repetition or an extrapolation of the 
previous good speech frame(s). This substitution is done so that it gradually will decrease the output level, resulting in 
silence at the output. Subclause 6 provides example solution. 

5.3 First lost SID frame 

A lost SID frame shall be substituted by using the SID information from earlier received valid SID frames and the 
procedure for valid SID frames be applied as described in [3], 
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5.4 Subsequent lost SID frames 



For many subsequent lost SID frames, a muting technique shall be applied to the comfort noise that will gradually 
decrease the output level. For subsequent lost SID frames, the muting of the output shall be maintained. Subclause 6 
provides example solutions. 



Example ECU/BFH Solution 



6.1 State Machine 

This example solution for substitution and muting is based on a state machine with seven states (Figure 1). 

The system starts in state 0. Each time a bad frame is detected, the state counter is incremented by one and is saturated 
when it reaches 6. Each time a good speech frame is detected, the state counter is right-shifted by one. The state 
indicates the quality of the channel: the larger the value of the state counter, the worse the channel quality is. The 
control flow of the state machine can be described by the following C code (BFI = bad frame indicator, State = state 
variable): 

if(BFI != ) 

State = State + 1 ; 

if(State > 6) 
State = 6; 
else 

State = State » 1 ; 

In addition to this state machine, the Bad Frame Flag from the previous frame is checked (prevBFI). The processing 
depends on the value of the State- variable. In states and 6, the processing depends on the BFI flag. 
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The procedure can be described as follows: 



\/ 



state = o 

BFI = 
PrevBFI = or 1 



I 



A 



stAte = 1 — 

(BFI, prevBFI) = 
(1,0) or (0,1) or 
(0,0) 



I 



<■ 



STATE = 2 

(BFI, prevBFI) = 

(1,1) or (1,0) or 
(0,1) 



I 



STATE = 3 

(BFI, prevBFI) = 

(1,1) or (1,0) or 
(0,1) 



I 



<■ 



STATE = 4 

BFI=1 
prevBFI = or 1 



I 



STATE = 5 

BFI=1 
prevBFI = 1 



I 



STATE = 6 

BFI=1 
prevBFI = 1 



J 



-> Bad frame (BFI=1) 



-> Good frame (BFI=0) 



Figure 1 : State machine for controlling the bad frame substitution 
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6.2 Substitution and muting of erroneous/lost speech frames 

6.2.1 BFI = 0, prevBFI = 0, State = or 1 

No error is detected in the received or in the previous received speech frame. The received speech parameters are used 
normally in the speech synthesis. The current frame of speech parameters is saved. 

6.2.2 BFI = 0, prevBFI = 1 , State = to3 

No error is detected in the received speech frame but the previous received speech frame was bad. The LTP gain is used 
normally in the speech synthesis and fixed codebook gain are limited below the values used for the last received good 
sub frame: 



g c (n) = 



where 



Sreceived > 8 received ^ 100 ° r Sleceived ^ <? " (^ ~ 1) X 1 .25 

1.25*g e (n-l) ^otherwise 



(1) 



8 received = current decoded fixed codebook-gain 

g (n — 1) = fixed codebook gain used for the last good subframe (BFI = 0) 

g (n) = fixed codebook gain to be used for the current frame. 

The rest of the received speech parameters are used normally in the speech synthesis. The current frame of speech 
parameters is saved. 

6.2.3 BFI = 1 , prevBFI = or 1 , State = 1 ...6 

An error is detected in the received speech frame and the substitution and muting procedure is started. 

6.2.3.1 LTP gain & fixed codebook gain concealment 

when RX_FRAMETYPE = SPEECH_BAD 

The LTP gain g p and fixed codebook gain g c are replaced by attenuated values from the previous subframes: 
g p = P p (state)* median5(g p (n-l),...,g p (n-5)) (2) 

c \P c (state)*median5(g c (n-l),...,g c (n-5)) ,VAD_HIST<2 

8 = i 

[median5(g c (n-l),...,g c (n-5)) ,VAD_HIST>2 



(3) 



where: 



S p 

° = current decoded LTP gain, 

g c = current decoded fixed codebook gain, 

g p (n — 1),..., g p (n — 5) = LTP gains used for the last 5 subframes, 

g c (n — \),...,g c (n — 5)= fixed codebook gains used for the last 5 subframes, 

median5() = 5-point median operation, 

P p (state) = attenuation factor (P p (l) = 0.98, P p (2) = 0.96, P p (3) = 0.75, P p (4) = 0.23, P p (5) = 0.05, 

P p (6) = 0.01), 
P c (state) = attenuation factor (P c (1) = 0.98, P c (2) = 0.98, P c (3) = 0.98, P c (4) = 0.98, P c (5) = 0.98, 
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P c (6) = 0.70), 
state = state number {0..6}, 
VAD_HIST is number of consecutive VAD=0 decisions. 



The higher the state value is, the more the gains are attenuated. Also the memory of the predictive fixed codebook gain 
is updated by using the average value of the past four values in the memory: 



ener 



(o)=i 



^eneryn-i) 



(4) 



6.2.3.2 LTP gain & fixed codebook gain concealment 

when RX_FRAMETYPE = SPEECH_LOST 

The LTP gain g p and fixed codebook gain g c are replaced by attenuated values from the previous subframes: 



g p = p p (state) * median5(g p in - 1),.. ., g p in - 5)) 

I P c (state) * median5(g c (n -1),..., g c (n - 5)) 

£ e = 1 

[median5(g c (n -l),...,g c (n -5)) 



,VAD_HIST<2 
,VAD HIST > 2 



(5) 



(6) 



where: 



8 

6 = current decoded LTP gain, 

g c = current decoded fixed codebook gain, 

g p (n — 1),..., g p (n — 5) = LTP gains used for the last 5 subframes, 

g c (fl — \),...,g c (n — 5)= fixed codebook gains used for the last 5 subframes, 

median5() = 5-point median operation, 

P p (state) = attenuation factor (P p (l) = 0.95, P" (2) = 0.90, P p (3 ) = 0.75, P" (4) = 0.23, P p (5) = 0.05, 

P p (6) = 0.01), 

P c (state) = attenuation factor (P c (1) = 0.50, P c (2) = 0.25, P c (3) = 0.25, P c (4) = 0.25, P c (5) = 0. 15, 

P c (6) = 0.01), 
state = state number {0..6}, 
VAD_HIST is number of consecutive VAD=0 decisions. 



The higher the state value is, the more the gains are attenuated. Also the memory of the predictive fixed codebook gain 
is updated by using the average value of the past four values in the memory: 



ener 



(o)=i 



^eneryn-i) 



(7) 



6.2.3.3 ISF concealment 

The past ISFs are shifted towards their partly adaptive mean: 

ISF q (i) = a * past _ ISF q (i) + (1 - a) * ISF mean (i) 

where 



i =0..16(8) 
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a =0.9, 

ISF (i) is ISF- vector for a current frame, 

past _ ISF (i) is ISF- vector from the previous frame, 

ISF mean (i) vector is combination of adaptive mean and constant mean ISF- vectors in the following manner: 

ISF mean (i) = J3 * ISF constmean (i) + (1 - p) * ISF adapUvemean (i) , i=0..16 (9) 

where 

j5 =0.75, 



I 2 
!M Captive _mean(0 = " £ P ast - ISF q ^ and is u P dated whenever BFI =0. 

ISF comt mean (i) is a vector containing long time average of ISF- vectors. 



6.2.3.4 LTP-lag concealment 

The histories of five last good LTP-lags and LTP-gains are used for finding the best method to update. 

6.2.3.4.1 LTP-lag concealment when RX_FRAMETYPE = SPEECH_BAD 

The usability of the received LTP lag ( Q, ) is defined as follows: (Predicts if the received lag is most probably very 
close to one that was sent and therefore its usage should not introduce any bad artifacts) 

, T dif < 10 and T mn - 5 < T received < T mn + 5 
,g p (n-l)>0.5andg p (n-2)>0.5andT(n-l)-10<r reee , w <T(n-l) + 10 



Qlae ~ 



'L < 0.4 and g " (n - 1) = g^ and T mn < T received < T, 



min min received max C\(X\ 



,T dif < 70 and T mn < T received < T D 
T <T < T 

' mean received max 



, otherwise 



where: 



T{n — 1) is LTP lag from the previous good frame, 

-* dif ~ |-* received ~ * ^ U ~ v| ' 

7 m 1 „= min (^ r )' 

r max= m ax(7; i#er ), 

r rec dv^ is received lag, 

^ p is LTP gain of the current frame, 

^ p (-1) is LTP gain of the previous good frame, 

g p (-2) is LTP gain of the frame before previous good frame, 

T mean= aVera 8 e( ?b»ffer) 

LPT lag value for the current frame is defined as follows: 
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T = 



received 



^Z^x +^ax-! +T max . 2 ) + RND(T max -r max _ 2 ) 



13 



>Q lag =o 



(ID 



where: 



max (^ r ) 



max— 1 



is second largest value in 7) 



buffer ■ 



r max-2 is SeCOnd lar 8 eSt Value in T buffer > 

RND(x) is random value generated to range 






6.2.3.4.2 



LTP-lag concealment when RX_FRAMETYPE = SPEECH_LOST 



The usability of the LTP lag from last good frame ( Q, t _ x ) is defined as follows: (Predicts if the received lag is most 
probably very close to one that was sent and therefore its usage should not introduce any bad artifacts) 



Qu 



,gl>0.5andT dif <10 

,g p (n-l)>0.5andg p («-2)>0.5 
, otherwise 



where: 






g (n-l) is LTP gain of the previous good frame, 

g p (n-2) is LTP gain of the frame before previous good frame 



(12) 



LPT lag value for the current frame is defined as follows: 



T = 



T(n-l) 

1 



Z (^ax + Tmax-1 + T.ax-2 ) + ^£(7^ - T^_ 2 ) 



where: 



T(n — Y) is LTP lag from the previous good frame, 

r max= m ax(7;„ ifer ), 

^max-1 is SeCOnd lar 8 eSt Value in T buffer > 

r max-2 is SeCOnd lar g eSt Value in T buffer > 



RND(x) is random value generated to range 



X X 

~r + i 



,Qu*>l=0 



(13) 



6.2.4 Innovation sequence 

When RX_FRAMETYPE = SPEECH_BAD, the received fixed codebook innovation pulses from the erroneous frame 
are used as they are received. 
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When RX_FRAMETYPE = SPEECH_LOST, the received fixed codebook innovation pulses from the erroneous frame 
are not used and the fixed codebook innovation vector is filled with random signal (values limited to 
range [-1, +1]). 

6.2.5 High-band gain (for 23.85 kbit/s mode) 

When RX_FRAMETYPE = SPEECH_BAD or RX_FRAMETYPE = SPEECH_LOST the received high-band energy 
parameter of the frame is not used and the estimation for the high-band gain is used instead. This means that in case of 
bad/lost speech frames, the high-band reconstruction operates in the same way for all the modes. 

6.3 Substitution and muting of lost SID frames 

In the speech decoder a single frame classified as SID_BAD shall be substituted by the last valid SID frame information 
and the procedure for valid SID frames be applied. If the time between SID information updates (updates are specified 
by SID_UPDATE arrivals and occasionally by SID_FIRST arrivals) is greater than one second this shall lead to 
attenuation. 
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